function [pval, zscore] = SigClustSM(data,paramstruct) % SIGCLUSTSM, statistical SIGnificance of CLUSTers % Steve Marron's matlab function % Studies whether or not clusters are really there, % using 2-means (k = 2) clustering index as a statistic. % Gets significance by simulation from a single Gaussian % distribution. Gaussian parameters are estimated from % the data. Some diagnostics are shown to study performance. % Note: for testing only whether 2 given classes are different, % recommended test is DiProPermSM.m % (DIrection-PROjection-PERMutation) % The present test focusses on clusters, in the special sense % of coming from a single Gaussian % % Inputs: % data - d x n matrix of data, each column vector is % a "d-dim data vector" % % paramstruct - a Matlab structure of input parameters % Use: "help struct" and "help datatypes" to % learn about these. % Create one, using commands of the form: % % paramstruct = struct('field1',values1,... % 'field2',values2,... % 'field3',values3) ; % % where any of the following can be used, % these are optional, misspecified values % revert to defaults % % fields values % % vclass Vector of 1s and 2s, which indicates given class labels % (grouping to be tested for significance). % When this is not given, or set to scalar 0, use labels % generated by 2-means clustering % Must be a row vector, of length n, % otherwise execution is terminated % % sigbackg Background Standard Deviation, given value. % When this is not given, or set to scalar 0, % estimate from Data. % % iCovEst Covariance Estimation Type % 1 - Use Hanwen Huang's Soft Thresholded, Constrained MLE % (default, and recommended) % (from Huang, Liu, Yuan, Marron 2014 JCGS paper) % 2 - Use Sample Covariance Estimate % (recommended when diagnostics fail) % 3 - Use Original Background Noise Thresholded Estimate % (from Liu, et al, 2008 JASA paper) % (In Yuan's terminology, this is "Hard Thresholding") % (this is anti-conservative for a few % very large eigenvalues) % 4 - Use Ming Yuan's Soft Thresholded, Constrained MLE % (from Ming Yuan's notes, 2007) % (this is anti-conservative when the first % eigenvalue is relatively small) % % ipvaltype Type of p-value to compute % 1 - Only p-value (empirical quantile) % 2 - Only Z-score (useful for comparisons, when % p-values are all 0) % 3 - (default) Both p-value (empirical quantile) % and Z-score % (note: uncomputed p-values are returned as empty, % and not shown in output plots) % % twoMtype Type of 2-means clusterings to compute % 1 - use simple random restarts % (i.e. make calls to SigClust2meanRepSM.m) % 2 (default) - use PCA starts % (i.e. make calls to SigClust2meanFastSM.m) % % twoMsteps Number of steps to use in 2 means clustering % computations % (default = 1, chosen to optimize speed) % % nsim Number of simulated Gaussian samples to use % for main p-value computation % (default = 1000) % % InitRandstate State of uniform random number generator, % for Initial Calculation of Data Cluster Index % When empty, or not specified, just use current seed % (has no effect, unless twoMtype = 1 & vclass = 0) % % InitRandnstate State of normal random number generator, % for Initial Calculation of Data Cluster Index % When empty, or not specified, just use current seed % (has no effect, unless twoMtype = 1 & vclass = 0) % % SimRandstate State of uniform random number generator, % for Main Simulation % When empty, or not specified, just use current seed % (has no effect, unless twoMtype = 1) % % SimRandnstate State of normal random number generator, % for Main Simulation % When empty, or not specified, just use current seed % (has no effect, unless twoMtype = 1) % % datastr String Descriptive of data set, % mostly for use in graphical output, % when these are requested. % Often may want to end this with "data" % (default is []) % % iBGSDdiagplot 0 do not make BackGround Standard Deviation % DIAGnostic PLOTs % 1 (default) make BackGround Standard Deviation % DIAGnostic PLOTs: % i. Overlay and density plot of pixel dist'n, % with robust Gaussian fit % ii. QQ plot assessing quality of robust fit % of Gaussian Distribution % This has no effect, unless sigbackg == 0 % % BGSDsavestr string controlling saving of output for BackGround % Standard Deviation DIAGnostic PLOTs, % either a full path, or a file prefix to % save in matlab's current directory. % Probably makes sense to end with % something like: 'AllPixel' % Will add: % 'KDE' for plot i. % 'QQ' for plot ii. % Will also add .ps (so don't do this again!), % and save as color postscript files. % Unspecified (or empty): % Results only appear on screen % Only has effect when iBGSDdiagplot = 1 % % iCovEdiagplot 0 do not make COVariance Estimation % DIAGnostic PLOT % 1 (default) make COVariance Estimation % DIAGnostic PLOT % % CovEsavestr string controlling saving of output for % COVariance Estimation DIAGnostic PLOTs, % either a full path, or a file prefix to % save in matlab's current directory. % Probably makes sense to end with % something like: 'EstEigVal' % Will also add .ps (so don't do this again!), % and save as color postscript files. % Unspecified (or empty): % Results only appear on screen % Only has effect when iCovEdiagplot = 1 % % ipValplot 0 do not make p Value plot % 1 (default) make p Value plot % % pValsavestr string controlling saving of output for % p Value plot, % either a full path, or a file prefix to % save in matlab's current directory. % Probably makes sense to end with % something like: 'pVal' % Will also add .ps (so don't do this again!), % and save as color postscript files. % Unspecified (or empty): % Results only appear on screen % Only has effect when ipValplot = 1 % % legendcellstr For p Value Plot: % cell array of strings for legend (nl of them), % useful for (colored) classes, create this using % cellstr, or {{string1 string2 ...}} % Note: These strange double brackets seems to be needed % for correct pass to subroutine % It may change in later versions of Matlab % CAUTION: If are updating this field, using a command like: % paramstruct = setfield(paramstruct,'legendcellstr',... % Then should only use single braces in the definition of % legendecellstr, i. e. {string1 string2 ...} % Suggested uses: % For vclass = 0: % legendcellstr = {{'Testing Best 2-means Split'}} % For given vclass: % legendcellstr = {{'Class 1' 'vs.' 'Class 2'}} % then use mlegendcolor appropriately, % e.g. mlegendcolor = [[1 0 0]; [0 0 0]; [0 0 1]] % Only has effect when ipValplot = 1 % % mlegendcolor For p Value Plot: % nl x 3 color matrix, corresponding to cell legends above % (not needed when legendcellstr not specified) % (defaults to black when not specified) % Only has effect when ipValplot = 1 % % iscreenwrite 0 (default) no screen writes % except warning messages % 1 write to screen to show major steps % 2 to show most steps, but only one write at % each step of main simulation loop % 3 to show all steps, not recommended since it % makes multiple writes for each step of % main simulation loop % % % Outputs: % % pval Simulated SigClust P-value, % Based on Empirical Quantiles, % Computed by cquantSM.m % % zscore Z-score summary (in permutation population), % useful for comparisons, when pval = 0 % % Graphics in different Figure windows % When ___savestr exists, % Color Postscript files saved in '___savestr'.ps % % Assumes path can find personal functions: % SigClust2meanRepSM.m % SigClust2meanFastSM.m % SigClustCovEstHH.m % SigClustCovEstSM.m % SigClustLabelPlotSM.m % SigClust2meanPerfSM.m % vec2matSM.m % axisSM.m % scatplotSM.m % bwsjpiSM.m % kdeSM.m % lbinrSM.m % vec2matSM.m % pcaSM.m % projplot1SM.m % projplot2SM.m % bwrfphSM.m % bwosSM.m % rootfSM.m % bwrotSM.m % bwsnrSM.m % iqrSM.m % cquantSM.m % nmfSM.m % Copyright (c) J. S. Marron 2007,2008,2009,2014 % First set all parameters to defaults % vclass = 0 ; sigbackg = 0 ; iCovEst = 1 ; ipvaltype = 3 ; twoMtype = 2 ; twoMsteps = 1 ; nsim = 1000 ; InitRandstate = [] ; InitRandnstate = [] ; SimRandstate = [] ; SimRandnstate = [] ; datastr = [] ; iBGSDdiagplot = 1 ; BGSDsavestr = [] ; iCovEdiagplot = 1 ; CovEsavestr = [] ; ipValplot = 1 ; pValsavestr = [] ; legendcellstr = {} ; mlegendcolor = [] ; iscreenwrite = 0 ; % Now update parameters as specified, % by parameter structure (if it is used) % if nargin > 1 ; % then paramstruct is an argument if isfield(paramstruct,'vclass') ; % then change to input value vclass = getfield(paramstruct,'vclass') ; end ; if isfield(paramstruct,'sigbackg') ; % then change to input value sigbackg = getfield(paramstruct,'sigbackg') ; end ; if isfield(paramstruct,'iCovEst') ; % then change to input value iCovEst = getfield(paramstruct,'iCovEst') ; end ; if isfield(paramstruct,'ipvaltype') ; % then change to input value ipvaltype = getfield(paramstruct,'ipvaltype') ; end ; if isfield(paramstruct,'twoMtype') ; % then change to input value twoMtype = getfield(paramstruct,'twoMtype') ; end ; if isfield(paramstruct,'twoMsteps') ; % then change to input value twoMsteps = getfield(paramstruct,'twoMsteps') ; end ; if isfield(paramstruct,'nsim') ; % then change to input value nsim = getfield(paramstruct,'nsim') ; end ; if isfield(paramstruct,'InitRandstate') ; % then change to input value InitRandstate = getfield(paramstruct,'InitRandstate') ; end ; if isfield(paramstruct,'InitRandnstate') ; % then change to input value InitRandnstate = getfield(paramstruct,'InitRandnstate') ; end ; if isfield(paramstruct,'SimRandstate') ; % then change to input value SimRandstate = getfield(paramstruct,'SimRandstate') ; end ; if isfield(paramstruct,'SimRandnstate') ; % then change to input value SimRandnstate = getfield(paramstruct,'SimRandnstate') ; end ; if isfield(paramstruct,'datastr') ; % then change to input value datastr = getfield(paramstruct,'datastr') ; end ; if isfield(paramstruct,'iBGSDdiagplot') ; % then change to input value iBGSDdiagplot = getfield(paramstruct,'iBGSDdiagplot') ; end ; if isfield(paramstruct,'BGSDsavestr') ; % then change to input value BGSDsavestr = getfield(paramstruct,'BGSDsavestr') ; if ~(ischar(BGSDsavestr) | isempty(BGSDsavestr)) ; % then invalid input, so give warning disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! Invalid BGSDsavestr, !!!') ; disp('!!! using default of no save !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; BGSDsavestr = [] ; end ; end ; if isfield(paramstruct,'iCovEdiagplot') ; % then change to input value iCovEdiagplot = getfield(paramstruct,'iCovEdiagplot') ; end ; if isfield(paramstruct,'CovEsavestr') ; % then change to input value CovEsavestr = getfield(paramstruct,'CovEsavestr') ; if ~(ischar(CovEsavestr) | isempty(CovEsavestr)) ; % then invalid input, so give warning disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! Invalid CovEsavestr, !!!') ; disp('!!! using default of no save !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; CovEsavestr = [] ; end ; end ; if isfield(paramstruct,'ipValplot') ; % then change to input value ipValplot = getfield(paramstruct,'ipValplot') ; end ; if isfield(paramstruct,'pValsavestr') ; % then change to input value pValsavestr = getfield(paramstruct,'pValsavestr') ; if ~(ischar(pValsavestr) | isempty(pValsavestr)) ; % then invalid input, so give warning disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! Invalid pValsavestr, !!!') ; disp('!!! using default of no save !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; pValsavestr = [] ; end ; end ; if isfield(paramstruct,'legendcellstr') ; % then change to input value legendcellstr = getfield(paramstruct,'legendcellstr') ; end ; if isfield(paramstruct,'mlegendcolor') ; % then change to input value mlegendcolor = getfield(paramstruct,'mlegendcolor') ; end ; if isfield(paramstruct,'iscreenwrite') ; % then change to input value iscreenwrite = getfield(paramstruct,'iscreenwrite') ; end ; end ; % of resetting of input parameters % set preliminary stuff % d = size(data,1) ; % dimension of each data curve n = size(data,2) ; % number of data curves pval = [] ; zscore = [] ; CurFigNum = 0 ; if iscreenwrite == 2 ; iscreenwritemost = 1 ; iscreenwriteall = 0 ; % for use as arguments in other calls elseif iscreenwrite == 3 ; iscreenwritemost = 1 ; iscreenwriteall = 1 ; % for use as arguments in other calls else ; iscreenwritemost = 0 ; iscreenwriteall = 0 ; % for use as argument in other calls end ; if isempty(vclass) ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! vclass is empty !!!') ; disp('!!! Resetting to vclass = 0 !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; vclass = 0 ; end ; if vclass ~= 0 ; if (size(vclass,1) ~= 1) | (size(vclass,2) ~= n) ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Error from SigClustSM.m: !!!') ; disp('!!! vclass needs to be a row vector !!!') ; disp(['!!! of length ' num2str(n)]) ; disp('!!! Terminating Execution !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; return ; end ; flag1 = (vclass == 1) ; flag2 = (vclass == 2) ; if (sum(flag1) + sum(flag2)) ~= n ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Error from SigClustSM.m: !!!') ; disp('!!! vclass needs to contain all 1s and 2s !!!') ; disp('!!! Terminating Execution !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; return ; end ; end ; if ~isempty(mlegendcolor) ; if ~(size(legendcellstr,2) == size(mlegendcolor,1)) ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! legendcellstr & mlegendcolor !!!') ; disp('!!! must have the same number !!!') ; disp('!!! of entries !!!') ; disp('!!! Note: this could be caused by !!!') ; disp('!!! using "setfield(paramstruct..." !!!') ; disp('!!! with "lengedcellstr" defined !!!') ; disp('!!! by double braces !!!') ; disp('!!! of entries !!!') ; disp('!!! Resetting to no legend !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; legendcellstr = [] ; end ; end ; if ~isempty(mlegendcolor) ; if ~(size(mlegendcolor,2) == 3) ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! mlegendcolor !!!') ; disp('!!! must have 3 columns !!!') ; disp('!!! Resetting to no legend !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; legendcellstr = [] ; end ; end ; % Compute Initial Cluster Index for Data % if vclass == 0 ; % No initial clustering given, so compute from data if twoMtype == 1 ; % Use Random Restarts for computation paramstruct = struct('nrep',twoMsteps, ... 'randstate',InitRandstate, ... 'randnstate',InitRandnstate, ... 'iscreenwrite',iscreenwritemost) ; [temp, vindex] = SigClust2meanRepSM(data,paramstruct) ; % Ignore usual output of BestClass (Cluster Labelling) ClustIndData = min(vindex) ; % Since SigClust2meanRepSM returns full vector of Clust Indices elseif twoMtype == 2 ; % Use PCA starts for computation paramstruct = struct('maxstep',twoMsteps, ... 'ioutplot',0, ... 'iscreenwrite',iscreenwritemost) ; [temp, ClustIndData]= SigClust2meanFastSM(data,paramstruct) ; % Ignore usual output of BestClass (Cluster Labelling) else ; % Mispecified Computation Type disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Error from SigClustSM.m: !!!') ; disp('!!! Invalid value of twoMtype !!!') ; disp(['!!! twoMtype = ' num2str(twoMtype)]) ; disp('!!! Terminating Execution !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; return ; end ; else ; % Were given an initial clustering, so compute ClustInd for that ClustIndData = ClustIndSM(data,(vclass == 1),(vclass == 2)) ; end ; % Estimate covariance structure of data % % Start with sample eignevalues % paramstruct = struct('iscreenwrite',iscreenwritemost, ... 'viout',[1]) ; outstruct = pcaSM(data,paramstruct) ; veigval = getfield(outstruct,'veigval') ; nev = length(veigval) ; veigval = [veigval; zeros(d-nev,1)] ; % Pad out to have length d varflag = 0 ; % warning flag about possible bad background estimation if iCovEst ~= 2 ; % then need to consider thresholding npix = d * n ; vdata = reshape(data,npix,1) ; % all pixel data asd = std(vdata) ; avar = asd^2 ; if sigbackg == 0 ; % Need to estimate background standard deviation amean = mean(vdata) ; amedian = median(vdata) ; amad = madSM(vdata) ; % MAD, but on sd scale if iBGSDdiagplot ~= 0 ; % Then make BackGround Standard Deviation % DIAGnostic PLOTs CurFigNum = CurFigNum + 1 ; figure(CurFigNum) ; clf ; maxnol = 5000 ; paramstruct = struct('ndatovlay',maxnol, ... 'datovlaymax',0.65, ... 'datovlaymin',0.15, ... 'iscreenwrite',iscreenwritemost) ; kdeSM(vdata,paramstruct) ; if isempty(datastr) ; titstr = 'Distribution of All Pixel values combined' ; else ; titstr = ['Distribution of All Pixel values combined, ' datastr] ; end ; title(titstr,'FontSize',15) ; vax = axis ; hold on ; xgrid = linspace(vax(1),vax(2),401)' ; normden = nmfSM(xgrid,amedian,amad^2,1) ; plot(xgrid,normden,'r--') ; if length(vdata) < maxnol ; olstr = ['Overlay of '... num2str(length(vdata)) ' data points'] ; else ; olstr = ['Overlay of ' num2str(maxnol) ' of '... num2str(length(vdata)) ' data points'] ; end ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.9 * (vax(4) - vax(3)), ... olstr,'FontSize',15) ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.8 * (vax(4) - vax(3)), ... ['Mean = ' num2str(amean,3) ... ', median = ' num2str(amedian,3)], ... 'FontSize',15) ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.7 * (vax(4) - vax(3)), ... ['s.d. = ' num2str(asd,3) ... ', MAD = ' num2str(amad,3)], ... 'FontSize',15) ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.6 * (vax(4) - vax(3)), ... ['Gaussian(' num2str(amedian,3) ... ',' num2str(amad,3) ') density'], ... 'FontSize',15, 'Color','r') ; if amad > asd ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.45 * (vax(4) - vax(3)), ... ['Warning: MAD > s.d., non-Gaussian Background'], ... 'FontSize',18, 'Color','m') ; end ; hold off ; if ~isempty(BGSDsavestr) ; orient landscape ; savestr = [BGSDsavestr 'KDE.ps'] ; print('-dpsc2',savestr) ; end ; CurFigNum = CurFigNum + 1 ; figure(CurFigNum) ; clf ; savestr = [BGSDsavestr 'QQ'] ; if isempty(datastr) ; titstr = 'Robust Fit Gaussian Q-Q, All Pixel values' ; else ; titstr = ['Robust Fit Gaussian Q-Q, All Pixel values, ' datastr] ; end ; paramstruct = struct('idist',1, ... 'mu',amedian, ... 'sigma',amad, ... 'nsim',0, ... 'nsimplotval',900, ... 'savestr',savestr, ... 'titlestr',titstr, ... 'titlefontsize',15, ... 'labelfontsize',15, ... 'parfontsize',15, ... 'ishowpar',1, ... 'vshowq',[0.25; 0.5; 0.75], ... 'iscreenwrite',iscreenwritemost) ; qqLM(vdata,paramstruct) ; end ; % of iBGSDdiagplot if-block isigbackg = amad ; else ; % Use input value of background standard deviation isigbackg = sigbackg ; end ; % of sigbackg if-block simbackvar = isigbackg^2 ; % Background variance to use in simulations % Check whether background estimate is sensible, % in sense of being smaller than sample variance if simbackvar > avar ; % then have surprising background var est, % so give warning, and recommendations varflag = 1 ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Warning from SigClustSM.m: !!!') ; disp('!!! Background Variance Estimate is Suspect !!!') ; disp('!!! Because (MAD estimated sig2BG) > !!!') ; disp('!!! > (full matrix sample sd), !!!') ; disp(['!!! by factor of ' num2str(simbackvar / avar)]) ; disp('!!! Non-Gaussian Background Data !!!') ; disp('!!! Recommend Careful look at Diagnostic plots, !!!') ; disp('!!! using: iBGSDdiagplot = 1 !!!') ; disp('!!! Consider re-running SigClust test, !!!') ; disp('!!! using: iCovEst = 2 !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; end ; % Compute eigenvalues to use in Gaussian Simulation % if iCovEst == 3 ; % Use Original Background Noise Thresholded Estimate % (from Liu, et al, JASA paper) % (In Yuan's terminology, this is "Hard Thresholding") vsimeigval = max(veigval,simbackvar) ; % vector of eigenvalues for simulation elseif iCovEst == 4 ; % Use Ming Yuan's Soft Thresholded, Constrained MLE vsimeigval = SigClustCovEstSM(veigval,simbackvar) ; % vector of eigenvalues for simulation else ; % Use Hanwen Huang's Soft Thresholded, Constrained MLE vsimeigval = SigClustCovEstHH(veigval,simbackvar) ; % vector of eigenvalues for simulation end ; else ; % no adjustment to eigenvlues, just use empiricals vsimeigval = veigval ; % vector of eigenvalues for simulation % based on raw covariance type model end ; % of (iCovEst ~= 2) if-block if iCovEdiagplot == 1 ; % Then make COVariance Estimation DIAGnostic PLOT ncut = 100 ; CurFigNum = CurFigNum + 1 ; figure(CurFigNum) ; clf ; veigvalpos = veigval(veigval > 10^(-12)) ; dpos = length(veigvalpos) ; subplot(2,2,1) ; plot((1:d)',veigval,'ko-') ; if isempty(datastr) ; titstr = 'Eigenvalues' ; else ; titstr = ['Eigenvalues, ' datastr] ; end ; title(titstr,'FontSize',12) ; xlabel('Component #','FontSize',12) ; ylabel('Eigenvalue','FontSize',12) ; if iCovEst == 2 ; vax = axisSM(veigval) ; else ; vax = axisSM([veigval; simbackvar]) ; end ; vax = [0 (d+1) vax(1) vax(2)] ; axis(vax) ; hold on ; plot([ncut + 0.5; ncut + 0.5],[vax(3); vax(4)],'g-') ; plot((1:d)',vsimeigval,'r--','LineWidth',2) ; if iCovEst ~= 2 ; plot([0; d + 1],[simbackvar; simbackvar],'m-') ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.9 * (vax(4) - vax(3)), ... ['Background variance = ' num2str(simbackvar,3)], ... 'FontSize',12,'Color','m') ; end ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.8 * (vax(4) - vax(3)), ... 'Eigenvalues for Simulation', ... 'FontSize',12,'Color','r') ; if varflag == 1 ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.65 * (vax(4) - vax(3)), ... ['Warning: MAD > s.d.'], ... 'FontSize',15,'Color','m') ; end ; hold off ; subplot(2,2,2) ; plot((1:dpos)',log10(veigvalpos),'ko-') ; title('log_{10} Eigenvalues','FontSize',12) ; xlabel('Component #','FontSize',12) ; ylabel('log_{10}(Eigenvalue)','FontSize',12) ; if iCovEst == 2 ; vax = axisSM(log10(veigvalpos)) ; else ; vax = axisSM(log10([veigvalpos; simbackvar])) ; end ; vax = [0 (d+1) vax(1) vax(2)] ; axis(vax) ; hold on ; plot([ncut + 0.5; ncut + 0.5],[vax(3); vax(4)],'g-') ; plot((1:d)',log10(vsimeigval),'r--','LineWidth',2) ; if iCovEst ~= 2 ; plot([0; d + 1],log10([simbackvar; simbackvar]),'m-') ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.9 * (vax(4) - vax(3)), ... ['log_{10} Background variance = ' num2str(log10(simbackvar),3)], ... 'FontSize',12,'Color','m') ; end ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.8 * (vax(4) - vax(3)), ... 'Eigenvalues for Simulation', ... 'FontSize',12,'Color','r') ; if varflag == 1 ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.65 * (vax(4) - vax(3)), ... ['Non-Gaussian Background'], ... 'FontSize',15,'Color','m') ; end ; hold off ; if length(veigval) >= ncut ; subplot(2,2,3) ; plot((1:ncut)',veigval(1:ncut),'ko-') ; title('Zoomed in version of above','FontSize',12) ; xlabel('Component #','FontSize',12) ; ylabel('Eigenvalue','FontSize',12) ; vax = axisSM(veigval(1:ncut)) ; vax = [0 (ncut+1) vax(1) vax(2)] ; axis(vax) ; hold on ; plot((1:ncut)',vsimeigval(1:ncut),'r--','LineWidth',2) ; if iCovEst ~= 2 ; plot([0; ncut + 1],[simbackvar; simbackvar],'m-') ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.9 * (vax(4) - vax(3)), ... ['Background variance = ' num2str(simbackvar,3)], ... 'FontSize',12,'Color','m') ; end ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.8 * (vax(4) - vax(3)), ... 'Eigenvalues for Simulation', ... 'FontSize',12,'Color','r') ; hold off ; subplot(2,2,4) ; plot((1:min(dpos,ncut))',log10(veigvalpos(1:min(dpos,ncut))),'ko-') ; title('Zoomed in version of above','FontSize',12) ; xlabel('Component #','FontSize',12) ; ylabel('log_{10}(Eigenvalue)','FontSize',12) ; if iCovEst == 2 ; vax = axisSM(log10(veigvalpos(1:min(dpos,ncut)))) ; else ; vax = axisSM(log10([veigvalpos(1:min(dpos,ncut)); simbackvar])) ; end ; vax = [0 (ncut+1) vax(1) vax(2)] ; axis(vax) ; hold on ; plot((1:ncut)',log10(vsimeigval(1:ncut)),'r--','LineWidth',2) ; if iCovEst ~= 2 ; plot([0; d + 1],log10([simbackvar; simbackvar]),'m-') ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.9 * (vax(4) - vax(3)), ... ['log_{10} Background variance = ' num2str(log10(simbackvar),3)], ... 'FontSize',12,'Color','m') ; end ; text(vax(1) + 0.1 * (vax(2) - vax(1)), ... vax(3) + 0.8 * (vax(4) - vax(3)), ... 'Eigenvalues for Simulation', ... 'FontSize',12,'Color','r') ; hold off ; end ; if ~isempty(CovEsavestr) ; orient landscape ; savestr = [CovEsavestr '.ps'] ; print('-dpsc2',savestr) ; end ; end ; % of iCovEdiagplot if-block if (iscreenwrite == 1) | (iscreenwrite == 2) | (iscreenwrite == 3) ; if isempty(datastr) ; dispstr = 'SigClustSM: Finished Estimation of Parameters for Gaussian simulation' ; else ; dispstr = ['SigClustSM: Finished Estimation of Parameters for Gaussian simulation, for ' datastr] ; end ; disp(dispstr) ; end ; % Main Simulation Loop % if ~isempty(SimRandstate) ; % Then set random number generation seed rand('state',SimRandstate) ; end ; if ~isempty(SimRandnstate) ; % Then set random number generation seed randn('state',SimRandnstate) ; end ; vscale = sqrt(vsimeigval) ; vSimIndex = [] ; for isim = 1:nsim ; if iscreenwrite == 2 | (iscreenwrite == 3) ; disp([' SigClustSM: Working on sim ' num2str(isim) ' of ' num2str(nsim)]) ; end ; mdatsim = randn(d,n) ; mdatsim = mdatsim .* vec2matSM(vscale,n) ; if twoMtype == 1 ; % Use Random Restarts for computation paramstruct = struct('nrep',twoMsteps, ... 'iscreenwrite',iscreenwriteall) ; [temp, vindex] = SigClust2meanRepSM(mdatsim,paramstruct) ; % Ignore usual output of BestClass (Cluster Labelling) ClustIndSim = min(vindex) ; % Since SigClust2meanRepSM returns full vector of Clust Indices elseif twoMtype == 2 ; % Use PCA starts for computation paramstruct = struct('maxstep',twoMsteps, ... 'ioutplot',0, ... 'iscreenwrite',iscreenwriteall) ; [temp, ClustIndSim]= SigClust2meanFastSM(mdatsim,paramstruct) ; % Ignore usual output of BestClass (Cluster Labelling) else ; % Mispecified Computation Type disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; disp('!!! Error from SigClustSM.m: !!!') ; disp('!!! Invalid value of twoMtype !!!') ; disp(['!!! twoMtype = ' num2str(twoMtype)]) ; disp('!!! Terminating Execution !!!') ; disp('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!') ; return ; end ; vSimIndex = [vSimIndex; ClustIndSim] ; end ; if (iscreenwrite == 1) | (iscreenwrite == 2) | (iscreenwrite == 3) ; if isempty(datastr) ; dispstr = 'SigClustSM: Finished main simulation' ; else ; dispstr = ['SigClustSM: Finished main simulation, for ' datastr] ; end ; disp(dispstr) ; end ; % Output Final results % if ipvaltype ~= 2 ; % then compute Empirical Quantile based p-value pval = cprobSM(vSimIndex,ClustIndData) ; end ; if ipvaltype ~= 1 ; % then compute Z score simmean = mean(vSimIndex) ; simsd = std(vSimIndex) ; zscore = (ClustIndData - simmean) / simsd ; end ; if ipValplot == 1 ; % Then make p Value plot CurFigNum = CurFigNum + 1 ; figure(CurFigNum) ; clf ; vax = axisSM([vSimIndex; ClustIndData]) ; if isempty(datastr) ; kdetitstr = [num2str(nsim) ' Gaussian Simulated Cluster Indices'] ; else ; kdetitstr = [num2str(nsim) ' Gaussian Simulated Cluster Indices, ' datastr] ; end ; kdeparamstruct = struct('vxgrid',vax, ... 'linecolor','k', ... 'dolcolor','k', ... 'ibigdot',1, ... 'titlestr',kdetitstr, ... 'titlefontsize',12, ... 'xlabelstr','Cluster Index', ... 'ylabelstr','density', ... 'labelfontsize',12, ... 'datovlaymin',0.4, ... 'datovlaymax',0.6, ... 'iscreenwrite',iscreenwritemost) ; kdeSM(vSimIndex,kdeparamstruct) ; vax = axis ; hold on ; plot([ClustIndData; ClustIndData],[vax(3); vax(4)],'Color',[0 0.6 0]) ; text(vax(1) + 0.5 * (vax(2) - vax(1)), ... vax(3) + 0.9 * (vax(4) - vax(3)), ... ['Data Cluster Index = ' num2str(ClustIndData)],'Color',[0 0.6 0]) ; if ipvaltype ~= 2 ; % then write Empirical Quantile based p-value text(vax(1) + 0.5 * (vax(2) - vax(1)), ... vax(3) + 0.8 * (vax(4) - vax(3)), ... ['p-val (Empirical) = ' num2str(pval)],'Color',[0 0.6 0]) ; end; if ipvaltype ~= 1 ; % then write Gaussian based p-value text(vax(1) + 0.5 * (vax(2) - vax(1)), ... vax(3) + 0.7 * (vax(4) - vax(3)), ... ['Z Score = ' num2str(zscore)],'Color',[0 0.6 0]) ; xgrid = linspace(vax(1),vax(2),401)' ; plot(xgrid,nmfSM(xgrid,simmean,simsd^2,1),'k') ; end ; if varflag == 1 ; text(vax(1) + 0.5 * (vax(2) - vax(1)), ... vax(3) + 0.55 * (vax(4) - vax(3)), ... ['Warning: MAD > s.d.,'], ... 'FontSize',15,'Color','m') ; text(vax(1) + 0.5 * (vax(2) - vax(1)), ... vax(3) + 0.45 * (vax(4) - vax(3)), ... ['by factor of ' num2str(simbackvar / avar) ','], ... 'FontSize',15,'Color','m') ; text(vax(1) + 0.2 * (vax(2) - vax(1)), ... vax(3) + 0.35 * (vax(4) - vax(3)), ... ['Non-Gaussian Background Data'], ... 'FontSize',15,'Color','m') ; end ; if ~isempty(legendcellstr) ; % then add legend nlegend = length(legendcellstr) ; if isempty(mlegendcolor) ; mlegendcolor = vec2matSM(zeros(1,3),nlegend) ; % all black when unspecified end ; tx = vax(1) + 0.1 * (vax(2) - vax(1)) ; for ilegend = 1:nlegend ; ty = 0 + ((nlegend - ilegend + 1) / ... (nlegend + 1)) * (vax(4) - 0) ; text(tx,ty,legendcellstr(ilegend), ... 'Color',mlegendcolor(ilegend,:)) ; end ; end ; hold off ; if ~isempty(pValsavestr) ; orient landscape ; savestr = [pValsavestr '.ps'] ; print('-dpsc2',savestr) ; end ; end ; % of ipValplot if-block if (iscreenwrite == 1) | (iscreenwrite == 2) | (iscreenwrite == 3) ; if isempty(datastr) ; if ipvaltype == 1 ; % then write Quantile based p-value dispstr = ['SigClustSM: Finished with p-value = ' num2str(pval)] ; elseif ipvaltype == 2 ; % then write Gaussian based p-value dispstr = ['SigClustSM: Finished with Z-Score = ' num2str(zscore)] ; else ; % then write both p-values dispstr = ['SigClustSM: Finished with p-value = ' num2str(pval) ... ' & Z-Score = ' num2str(zscore)] ; end ; else ; if ipvaltype == 1 ; % then write Quantile based p-value dispstr = ['SigClustSM: Finished, for ' datastr ... ', with p-value = ' num2str(pval)] ; elseif ipvaltype == 2 ; % then write Gaussian based p-value dispstr = ['SigClustSM: Finished, for ' datastr ... ', with Z-Score = ' num2str(zscore)] ; else ; % then write both p-values dispstr = ['SigClustSM: Finished, for ' datastr ... ', with p-value = ' num2str(pval) ... ' & Z-Score = ' num2str(zscore)] ; end ; end ; disp(dispstr) ; end ;